\section{Gaussian process submodels}
\label{sec:gp-sub} 

\pkg{PyMC} \citep{pymc} represents random variables as objects of class \code{Stochastic}. For example, the following code produces a normally-distributed random variable, labelled \code{'a'}, with mean zero and precision one:
\begin{CodeChunk}
\begin{CodeInput}
a = pymc.Normal('a',0,1)
\end{CodeInput}
\end{CodeChunk}
The current value of variable \code{a} can be queried and updated as follows:
\begin{CodeChunk}
\begin{CodeInput}
print a.value
a.value = 0.5
\end{CodeInput}
\end{CodeChunk}
The logarithm of \code{a}'s probability density function, evaluated at its current value, can be obtained via \code{a.logp}. PyMC has a large library of optimized log-probability density/mass functions, and readily incorporates new functions created by users.

PyMC variables can serve as parents of other variables; for example, the following code produces a normally-distributed variable labelled \code{'b'} whose mean is \code{a}:
\begin{CodeChunk}
\begin{CodeInput}
b = pymc.Normal('b',a,1)    
\end{CodeInput}
\end{CodeChunk}

Any Python function can be used to transform random variables via objects of class \code{Deterministic}. For example, the following code produces a variable labelled \code{'c'} whose value is the square of \code{b} 's value:
\begin{CodeChunk}
\begin{CodeInput}
@pymc.deterministic
def c(b=b):
    return b**2
\end{CodeInput}
\end{CodeChunk}
Deterministic variables can be used as parents of other variables, stochastic or deterministic. 

Because it allows any \proglang{Python} function to be used as a log-probability density/mass function or variable transformation, PyMC supports a very large model space. This package enlarges that model space to include Gaussian processes via the \code{GaussianProcess} object, which is a random variable whose value is a \code{Realization} object.

Because a Gaussian process is an infinite-dimensional variable, it is not feasible to compute anything like a probability density function for it \footnote{A density with respect to a base measure could be used, but would introduce new complications. For example, Gaussian process measures with the Mat\`ern covariance function with different degrees of differentiability are not mutually continuous. \textbf{ref}}. \code{GaussianProcess}es therefore have no \code{logp} attribute, and \code{GaussianProcess}es cannot be handled by \pkg{PyMC}'s standard MCMC machinery. 

However, the evaluation $f(x_*)$ of Gaussian process $f$ on mesh $x_*$ is a simple multivariate normal random variable, which has a \code{logp} attribute and can be handled by the standard machinery. If $f(x_*)$ is incorporated in the model as a variable, a minor extension to the standard machinery (implemented by this package) makes it possible to handle $f$ itself as well. Pairs of $f$ and $f(x_*)$ variables are housed in container objects of class \code{GPSubmodel}.

\subsection{Example: nonparametric regression with unknown mean and covariance parameters}\label{sub:BasicMCMC}

A Gaussian process submodel is created in \code{pymc/examples/gp/PyMCmodel.py} with the following call:
\begin{CodeChunk}
\begin{CodeInput}
sm = gp.GPSubmodel('sm',M,C,fmesh)
\end{CodeInput}
\end{CodeChunk}
There are two stochastic variables in the submodel: \code{sm.f} and \code{sm.f_eval}. The first is the actual Gaussian process $f$: a stochastic variable valued as a \code{Realization} object. The second is $f(x_*)$, where $x_*$ is the input argument \code{fmesh}.

Once the Gaussian process submodel has been created, other variables depend on $f$ and $f(x_*)$ in any way that can be expressed in \proglang{Python} code. In \code{\pkg{PyMC}model.py}, the observation $d$ depends on $f(x_*)$ in a very standard way: 
\begin{CodeChunk}
\begin{CodeInput}
d = pymc.Normal('d',mu=sm.f_eval, tau=1./V, value=init_val, observed=True)
\end{CodeInput}
\end{CodeChunk}
In section \textbf{ref}, a \code{GaussianProcess}'s children depend on it in a highly nonstandard way. 

% The full probability model is shown as a directed acyclic graph in figure \ref{fig:unobservedModel}. It illustrates the conditional independence relationships between the variables in a GP submodel.

The file \code{pymc/examples/gp/MCMC.py} fits the probability model created in \code{\pkg{PyMC}model.py} using MCMC. The part of the file that actually dispatches the MCMC is very simple:
\begin{CodeChunk}
\begin{CodeInput}
GPSampler = MCMC(PyMCmodel)
GPSampler.isample(iter=5000,burn=1000,thin=100)    
\end{CodeInput}
\end{CodeChunk}
The file's output is shown in figure \ref{fig:MCMCOutput}. Note that after the MCMC run \code{GPSampler.trace('sm_f')[:]} yields a sequence of \code{Realization} objects, which can be evaluated on new arrays for plotting. GP realizations can even be tallied on disk using the HDF5 backend via \pkg{PyTables} \citep{tables}, see \cite{pymc}.

\begin{figure}
    \centering
        \epsfig{file=figs/gibbsSamples.pdf,width=8cm}
        % \epsfig{file=figs/metroSamples.pdf,width=10cm}
    \caption{The output of \code{pymc/examples/gp/MCMC.py}. The left-hand panel shows all the samples generated for the Gaussian process $f$, and the right-hand panel shows the trace of $f(0)$.}
    \label{fig:MCMCOutput}
\end{figure}
 

\section{Step methods}
\label{sec:step-methods} 
Since $f$ has no \code{logp} attribute, the Metropolis-Hastings family of step methods \citep{pymc} cannot be used to update $f$, $f(x_*)$ or any of their mean or covariance parameters. This package uses a relatively simple work-around that will be described here. Throughout this section, the parents of $f$ are denoted $P$ and the children $K$.


\subsection{Step methods that handle parents of Gaussian processes}
If a probability density function for $f$ were available, the Metropolis-Hastings acceptance ratio for a proposed value $P_p$ of the parents \emph{and} a proposed value $\tilde f$ for $f$ would be:
\begin{eqnarray*}
    \frac{p(K|f_p)\ p(f_p|P_p)\ q(P)}{p(K|f)\ p(f|P)\ q(P_p)}
\end{eqnarray*}
where $q$ denotes the proposal density. Now, suppose we proposed a value for $f$ conditional on the proposed values for the parents $P$ \emph{and} $f(x_*)$. The new acceptance ratio would become
\begin{eqnarray*}
    \frac{p(K|f_p)\ p(f_p|f(x_*), P_p)\ p(f(x_*) | P_p)\ q(f_p|f(x_*),f_p, P_p)\ q(P)}{p(K|f)\ p(f|f(x_*), P)\ p(f(x_*) | P)\ q(f_p|f(x_*),f,P)\ q(P_p)}
\end{eqnarray*}
 The problematic terms are the ones with $f$ or $f_p$ in the consequent position:
\begin{eqnarray*}
    p(f_p|f(x_*), P),\\ q(f|f(x_*),f_p,P),\\ p(f|f(x_*), P),\\ q(f_p|f(x_*),f,P_p),
\end{eqnarray*}
These can be made to cancel by choosing a proposal distribution as follows:
\begin{eqnarray*}
    q(f_p|f(x_*),f,P_p) = p(f_p|f(x_*), P).
\end{eqnarray*}
In other words, if $f$ is proposed from its prior distribution conditional on $f(x_*)$ and its parents whenever $f(x_*)$ is proposed, the intractable terms don't have to be computed. This argument can be made more rigorous by replacing $f$ with its evaluation at all the points at which its value would ever be needed.

\smallskip
To summarize, any Metropolis-Hastings step method can handle the parents of $f$, as well as $f(x_*)$, if it proposes values for $f$ jointly with its target variable as outlined above. 

This minor alteration in jumping strategy is implemented by the function \code{wrap_metropolis_for_gp_parents}, which takes a subclass of \code{Metropolis} as an argument and returns a new step method class with altered \code{propose} and \code{reject} methods. The function automatically produces modified versions of all Metropolis step methods in \pkg{PyMC}'s step method registry (\code{Metropolis}, \code{AdaptiveMetropolis}, etc.) \citep{pymc}. The modified step methods are automatically assigned to parents of Gaussian processes.

\subsection{Choosing a mesh} 

The mesh points $x_*$ are the points where Metropolis-Hastings step methods can `grab' the value of $f$ to moderate the variance of its proposal distribution. If $x_*$ is an empty array, $f$'s value will be proposed from its prior, and rejection rates are likely to be quite large. If $x_*$ is too dense, on the other hand, computation of the log-probability of $f(x_*)$ will be expensive, as it scales as the cube of the number of points in the mesh.This continuum is illustrated in figure \ref{fig:meshpropose}. Finding the happy medium requires some experimentation.

If $f$'s children depend on its value only via its evaluation on the mesh, the likelihood terms $p(K|f_p)$ and $p(K|f)$ will cancel. In other words, if the mesh is chosen so that $p(K|f)=p(K|f(x_*))$ then the proposed value of $f$ will have no bearing on the acceptance probability of the proposed value of $f(x_*)$ or of the parents $P$. This is the situation in \code{\pkg{PyMC}Model.py}. Such a mesh choice will generally improve the acceptance rate.

\begin{figure}
    \centering
        \epsfig{file=figs/nomeshpropose.pdf,width=4cm}
        \epsfig{file=figs/lightmeshpropose.pdf,width=4cm}
        \epsfig{file=figs/densemeshpropose.pdf,width=4cm}
    \caption{Several possible proposals of $f$ (curves) given proposed values for $f(x_*)$ (heavy dots) with no mesh (top), a sparse mesh (middle), and a dense mesh (bottom). Proposal distributions' envelopes are shown as shaded regions, with means shown as broken lines. With no mesh, $f$ is proposed from its prior and the acceptance rate will be very low. A denser mesh permits a high degree of control over $f$, but computing the log-probability will be more expensive.}
    \label{fig:meshpropose}
\end{figure}

\subsection{Gibbs steps} 
If all of $f$'s children, $K$, depend on it as follows:
\begin{eqnarray*}
    K_i|f \stackrel{\textup{\tiny ind}}{\sim} \textup{Normal}(f(x_{*i}),V_i)
\end{eqnarray*}
then $f(x_*)$ can be handled by the \code{GPEvaluationGibbs} step method. This step method is used in \code{MCMC.py}:
\begin{CodeChunk}
\begin{CodeInput}
GPSampler.use_step_method(gp.GPEvaluationGibbs, GPSampler.submod, \
    GPSampler.V, GPSampler.d)
\end{CodeInput}
\end{CodeChunk}
The initialization arguments are the Gaussian process submodel that contains $f$, the observation variance of $f$'s children, and the children, in this case the vector-valued normal variable $d$. 

\code{GPEvaluationGibbs} covers the standard submodel encountered in geostatistics, but there are many conjugate situations to which it does not apply. If necessary, special step methods can be written to handle these situations. If \code{GPEvaluationGibbs} is not assigned manually, $f(x_*)$ will generally be handled by a wrapped version of \code{AdaptiveMetropolis} \citep{pymc}, which implements the algorithm of Haario, Saksman and Tamminenn \textbf{ref}.





\section{Geostatistical example}\label{sub:geostat}
\begin{figure}
    \centering
        \epsfig{file=figs/elevmean.pdf, width=5cm}
        \epsfig{file=figs/elevvar.pdf, width=5cm}
    \caption{The posterior mean and variance surfaces for the $v$ variable of the Walker lake example. The posterior variance is relatively small in the neighborhood of observations, but large in regions where no observations were made.}
    \label{fig:walker}
\end{figure}
\begin{figure}
    \centering
        \epsfig{file=figs/elevdraw0.pdf, width=5cm}
        \epsfig{file=figs/elevdraw1.pdf, width=5cm}
    \caption{Two realizations from the posterior distribution of the $v$ surface for the Walker Lake example. Elevation is measured in meters.}
    \label{fig:walkerreal}
\end{figure}
Bayesian geostatistics is demonstrated in the folder \code{pymc/examples/gp/more_examples/Geostatistics}. File \code{getdata.py} downloads the Walker Lake dataset of Isaaks and Srivastava \citep{isaaks} from the internet and manipulates the $x$ and $y$ coordinates into the array format described in section \ref{sec:highdim}. File \code{model.py} contains the geostatistical model specification, which is

\begin{eqnarray*}
    d|f \sim \textup{Normal}(f(x),V)\\
    f|M,C \sim \textup{GP}(M,C) \\
    M:x\rightarrow m\\
    C:x,y,\mathtt{amp},\mathtt{scale},\mathtt{diff\_degree}\rightarrow \mathtt{matern.euclidean}(x,y;\mathtt{amp},\mathtt{scale},\mathtt{diff\_degree})\\
    p(m)\propto 1\\
    \mathtt{amp}\sim \textup{Exponential}(7e-5) \\
    \mathtt{scale}\sim \textup{Exponential}(4e-3) \\
    \mathtt{diff\_degree}\sim \textup{Uniform}(.5,2)\\ 
    V\sim \textup{Exponential}(5e-9)\\
\end{eqnarray*}

File \code{mcmc.py} fits the model and produces output maps.  The output of \code{mcmc.py} is shown in figures \ref{fig:walker} and \ref{fig:walkerreal}. Figure \ref{fig:walker} shows the posterior mean and variance of the $v$ variable of the dataset, which is a function of elevation (see \cite{isaaks}, appendix A).